NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Summary statistics of learning link changing neural representations to behavior

https://doi.org/10.3389/fncir.2025.1618351

Zavatone-Veth, Jacob A; Bordelon, Blake; Pehlevan, Cengiz (August 2025, Frontiers in Neural Circuits)

How can we make sense of large-scale recordings of neural activity across learning? Theories of neural network learning with their origins in statistical physics offer a potential answer: for a given task, there are often a small set of summary statistics that are sufficient to predict performance as the network learns. Here, we review recent advances in how summary statistics can be used to build theoretical understanding of neural network learning. We then argue for how this perspective can inform the analysis of neural data, enabling better understanding of learning in biological and artificial neural networks.
more » « less
Full Text Available
How feature learning can improve neural scaling laws

https://doi.org/10.1088/1742-5468/adefb1

Bordelon, Blake; Atanasov, Alexander; Pehlevan, Cengiz (August 2025, Journal of Statistical Mechanics: Theory and Experiment)

Abstract We develop a solvable model of neural scaling laws beyond the kernel limit. Theoretical analysis of this model shows how performance scales with model size, training time, and the total amount of available data. We identify three scaling regimes corresponding to varying task difficulties: hard, easy, and super easy tasks. For easy and super-easy target functions, which lie in the reproducing kernel Hilbert space (RKHS) defined by the initial infinite-width Neural Tangent Kernel (NTK), the scaling exponents remain unchanged between feature learning and kernel regime models. For hard tasks, defined as those outside the RKHS of the initial NTK, we demonstrate both analytically and empirically that feature learning can improve scaling with training time and compute, nearly doubling the exponent for hard tasks. This leads to a different compute optimal strategy to scale parameters and training time in the feature learning regime. We support our finding that feature learning improves the scaling law for hard tasks but not for easy and super-easy tasks with experiments of nonlinear MLPs fitting functions with power-law Fourier spectra on the circle and CNNs learning vision tasks.
more » « less
Full Text Available
Dynamics of finite width Kernel and prediction fluctuations in mean field neural networks

https://doi.org/10.1088/1742-5468/ad642b

Bordelon, Blake; Pehlevan, Cengiz (October 2024, Journal of Statistical Mechanics: Theory and Experiment)

Abstract We analyze the dynamics of finite width effects in wide but finite feature learning neural networks. Starting from a dynamical mean field theory description of infinite width deep neural network kernel and prediction dynamics, we provide a characterization of the $O (1 / \sqrt{width})$ fluctuations of the dynamical mean field theory order parameters over random initializations of the network weights. Our results, while perturbative in width, unlike prior analyses, are non-perturbative in the strength of feature learning. We find that once the mean field/µP parameterization is adopted, the leading finite size effect on the dynamics is to introduce initialization variance in the predictions and feature kernels of the networks. In the lazy limit of network training, all kernels are random but static in time and the prediction variance has a universal form. However, in the rich, feature learning regime, the fluctuations of the kernels and predictions are dynamically coupled with a variance that can be computed self-consistently. In two layer networks, we show how feature learning can dynamically reduce the variance of the final tangent kernel and final network predictions. We also show how initialization variance can slow down online learning in wide but finite networks. In deeper networks, kernel variance can dramatically accumulate through subsequent layers at large feature learning strengths, but feature learning continues to improve the signal-to-noise ratio of the feature kernels. In discrete time, we demonstrate that large learning rate phenomena such as edge of stability effects can be well captured by infinite width dynamics and that initialization variance can decrease dynamically. For convolutional neural networks trained on CIFAR-10, we empirically find significant corrections to both the bias and variance of network dynamics due to finite width.
more » « less
Full Text Available
Infinite Limits of Multi-head Transformer Dynamics

Bordelon, Blake; Chaudhry, Hamza T; Pehlevan, Cengiz (September 2024, Advances in Neural Processing Systems (NeurIPS) 2024)

Full Text Available
A Dynamical Model of Neural Scaling Laws

Bordelon, Blake; Atanasov, Alex; Pehlevan, Cengiz (May 2024, Forty-first International Conference on Machine Learning (ICML))

Full Text Available
Asymptotic Dynamics for Delayed Feature Learning in a Toy Model

Bordelon, Blake; Kumar, Tanishq; Gershman, Samuel J; Pehlevan, Cengiz (June 2024, High-dimensional Learning Dynamics 2024: The Emergence of Structure and Reasoning at ICML 2024)

We consider a toy model that exhibits grokking, recently advanced by [Kumar et al, 2023], and take advantage of the simple setting to derive the dynamics of the train and test loss using Dynamical Mean Field Theory (DMFT). This gives a closed-form expression for the gap between train and test loss that characterizes grokking in this toy model, illustrating how two parameters of interest -- NTK alignment and network laziness -- control the size of this gap and how grokking emerges as a uniquely offline property during repeated training over the same dataset. This is the first quantitative characterization of grokking dynamics in a general setting that makes no assumptions about weight decay, weight norm, etc.
more » « less
Full Text Available
Self-consistent dynamical field theory of kernel evolution in wide neural networks ^*

https://doi.org/10.1088/1742-5468/ad01b0

Bordelon, Blake; Pehlevan, Cengiz (November 2023, Journal of Statistical Mechanics: Theory and Experiment)

Abstract We analyze feature learning in infinite-width neural networks trained with gradient flow through a self-consistent dynamical field theory. We construct a collection of deterministic dynamical order parameters which are inner-product kernels for hidden unit activations and gradients in each layer at pairs of time points, providing a reduced description of network activity through training. These kernel order parameters collectively define the hidden layer activation distribution, the evolution of the neural tangent kernel (NTK), and consequently, output predictions. We show that the field theory derivation recovers the recursive stochastic process of infinite-width feature learning networks obtained by Yang and Hu with tensor programs. For deep linear networks, these kernels satisfy a set of algebraic matrix equations. For nonlinear networks, we provide an alternating sampling procedure to self-consistently solve for the kernel order parameters. We provide comparisons of the self-consistent solution to various approximation schemes including the static NTK approximation, gradient independence assumption, and leading order perturbation theory, showing that each of these approximations can break down in regimes where general self-consistent solutions still provide an accurate description. Lastly, we provide experiments in more realistic settings which demonstrate that the loss and kernel dynamics of convolutional neural networks at fixed feature learning strength are preserved across different widths on a image classification task.
more » « less
Full Text Available
Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks

Bordelon, Blake; Pehlevan, Cengiz (September 2023, Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS))

Full Text Available
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Bordelon, Blake; Noci, Lorenzo; Li, Mufan B; Hanin, Boris; Pehlevan, Cengiz (January 2024, The Twelfth International Conference on Learning Representations (ICLR))

Full Text Available
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Bordelon, Blake; Noci, Lorenzo; Li, Mufan Bill; Hanin, Boris; Pehlevan, Cengiz (January 2024, Proceedings of International Conference on Learning Representations 2024)

Full Text Available

« Prev Next »

Search for: All records